Maximizing Tree Diversity by Building Complete-Random Decision Trees

نویسندگان

  • Fei Tony Liu
  • Kai Ming Ting
  • Wei Fan
چکیده

One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different randomization methods and find that complete-random test selection produces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well accepted practice in constructing decision trees is to apply bootstrap sampling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most accurate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Utility of Randomness in Decision Tree Ensembles

The use of randomness in constructing decision tree ensembles has drawn much attention in the machine learning community. In general, ensembles introduce randomness to generate diverse trees and in turn they enhance ensembles’ predictive accuracy. Examples of such ensembles are Bagging, Random Forests and Random Decision Tree. In the past, most of the random tree ensembles inject various kinds ...

متن کامل

Modeling phonetic context with non-random forests for speech recognition

Modern speech recognition systems typically cluster triphone phonetic contexts using decision trees. In this paper we describe a way to build multiple complementary decision trees from the same data, for the purpose of system combination. We do this by jointly building the decision trees using an objective function that has an added entropy term to encourage diversity among the decision trees. ...

متن کامل

Efficient Learning of Random Forest Classifier using Disjoint Partitioning Approach

Random Forest is an Ensemble Supervised Machine Learning technique. Research work in the area of Random Forest aims at either improving accuracy or improving performance. In this paper we are presenting our research towards improvement in learning time of Random Forest by proposing a new approach called Disjoint Partitioning. In this approach, we are using disjoint partitions of training datase...

متن کامل

Study of plant diversity in the Northern Zagros forest (Case study: Marivan region)

Silvicultural operation need to notice the species diversity. To this study Gomarlang district in marivan region, northern zagros forest was selected. In this study 30 circle sample plots (500 m2) were collected by random method. In every sample plot the kind of species and number of trees and shrub were recorded. In the sample plots the micro plots of 5 m by 5 m (i.e. area of 25 m2) were desig...

متن کامل

Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees

Random forest can achieve high classification performance through a classification ensemble with a set of decision trees that grow using randomly selected subspaces of data. The performance of an ensemble learner is highly dependent on the accuracy of each component learner and the diversity among these components. In random forest, randomization would cause occurrence of bad trees and may incl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005